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[57] 



ABSTRACT 



A high-performance RAID system for a PC comprises a 
controller card which controls an array of ATA disk drives. 
The controller card includes an array of automated disk 
drive controllers, each of which controls one respective disk 
drive. The disk drive controllers are connected to a micro- 
controller by a control bus and are connected to an auto- 
mated coprocessor by a packet -switched bus. The coproces- 
sor accesses system memory and a local buffer. In operation, 
the disk drive controllers respond to controller commands 
from the microcontroller by accessing their respective disk 
drives, and by sending packets to the coprocessor over the 
packet-switched bus. The packets carry I/O data (in both 
directions, with the coprocessor filling-in packet payloads 
on I/O writes), and carry transfer commands and target 
addresses that are used by the coprocessor to access the 
buffer and system memory. The packets also carry special 
completion values (generated by the microcontroller) and 
I/O request identifiers that are processed by a logic circuit of 
the coprocessor to detect the completion of processing of 
each I/O request. The coprocessor grants the packet- 
switched bus to the disk drive controllers using a round robin 
arbitration protocol which guarantees a minimum I/O band- 
width to each disk drive. This minimum I/O bandwidth is 
preferably greater than the sustained transfer rate of each 
disk drive, so that all drives of the array can operate at the 
sustained transfer rate without the formation of a bottleneck. 

47 Claims, 9 Drawing Sheets 
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,™«,rru example, some RAID array controllers rely on the use of a 

DISK ARRAY CONTROLLERWrTH expensive microcontroller that can process I/O 

AUTOMATED PROCESSOR WHICH S?« . high transfer rate. Other designs rely on complex 

-r.„ ,,n r>»TA ArrnnniNG TO oala *' a . 8 . , ,u» nf i-.vnensive 

KUU i ca i/w pprpivFD disk drive interlaces, anu mus i^uuv , 

ADDRESSES AND COMMANDS RECEIVED 

FROM DISK DRIVE CONTROLLERS ™ presem mvention addresses these and other hm.ta- 

PRIORITY CLAIM tions in existing RAID architectures. 

Tnis application claims the benefit of U.S. Provisional SUMMARY OF THE INVENTION 

aJS a ton Ser. No. 60/065,848, filed Nov. 14. 1997. titled 10 provides . high-performance archt- 

mGH PERFORMANCE ARCHITECTURE FOR DISK J hardware-implemented RAID or other dak 

array SYSTEM arrav system. An important benefit of the architecture b tha 

ARRAY SYSTEM. "SisT high degree of performance (both transanal 

FIELD OF THE INVENTION 1^^) without the need for disk drives that are 

-n, c,„, invention relates to disk arrays, and more is base d on expensive or complex disk drive interfaces^ 

software architectures ,„ , eterred embodiment, the architecture » embod.ed 

^emen.edR^ID(Redundan,Array of inex- ^ \ PC . based disk array system whic com ^s an 

pensive rjisks) L other «* array systems. ^g^lS^SS^ 

BACKGROUND OF THE INVENTION 20 ^ disk dri ve controllers, each of which controls a single, 

"software RAID." With software RAID, software (typicaUy he commands received from the disk drive 
p"t of the operating system) which run, on , 0* hos com ^ ^ of the coproce sor is to control 
outer is used to implement the various RAID con rol tone ' on "™L rf jve controllers , 0 the packet- 
Tons. Tnese control functions include, for «*^«; S Jo hereby control the flow of I/O data, 
ating drive-specific read/write reques Scroller card further includes a microcontroller 
striping algorithm, reconstructing lost data when drive tail The ~™°" er dfek drive controllers and to the 
ures occur and generating and checking parity Because which ^ ° 7 " ^ The microcontroUer runs 

of parity information occupies bandwidth on the system bus, ' P^* fae microcontroller does not process or 

Software RAID frequently produces a degradation in per- 1/0 dala (as described below), 

formance over single disk drive systems. _ f lo ™, low-performance microcontroller can advanta- 

JSZ1SSZ%£ 40 ^1 r the control card processes multiple ,/0 

presents the array to the host c°^ u ^^n 31 £ S "ho t computer..* microcontroller gen- 

te disk drive. Because little or no host CPU bandwiain 4J received irom u r controller commands 

uLd o perform the RAID control functions, and because no crates "^^^^Sd configuration), and dis- 

RAID p'arity traffic flows across the system bus, little or no ^^J^^,^ over the local control 

degradation in performance occurs. i ,T<L Hi,k drive controllers. In addition to containing 

One potential benefit of RAID systems is that the mom tote ** dnve^ ^ 

outnut f'W) data can be transferred to and from mul Uple 50 f« ™^ "° ds ' and , arg et addresses that are 

^drives In parallel. By exploiting AJ , P aral.eU,m -s ^-^nds and^g ^ ^ daia 

(narticularly within a hardware-implemented RAIL) ;; sl6m memory and the local buffer, 

system) it impossible to achieve a higher degree of perfor- to and from systea m ' J & abo inchlde disk 

mince han is possible with a single disk drive. The two Some of the comro i identifiers) that 

Sypes of P-formance that can potently be mcrcase 55 compleUo Jj^^J^k completion status 

are the number of I/O requests processed per second are used By inec p completion values are 

'HraLction^^ 

of I/O data transferred per second ( streaming g ueraUOD , ^ ^ ^ ^ disk compleUon va i u6S 

performance"). _ m , ^ L r a eiven I/O request produces a final completion value 

P Unfortunately, few hardware-implemented RAID systems 60 h » P«^ J P q (he coprocessor . As desenbed 

provide an appreciable increase in ^°<™™J°™1 Sables'the coprocessor to detect the completion 

cases, this failure to provide a performance ;™P»^*£ 0 f processing of an I/O request without prior knowledge of 

;sr '^rfoir ca'n rz? sss - V ° f inv ° ked disk dnves - eic) ot thc 1/0 

frequent interrupt of the host computer's processor. 6S reque . cnuoUer commands, the disk drive 

,„ addition, attempts to increase V^^^ their respective disk drives and send 

relied on the use of expens.ve hardware components, hor 
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Teh dSkS bandwidth is equal to UN of „ the I/O data with the automated process to the 

toTal Jo bandwidth of the packet-switched bus, where N is memory according , 0 the transfer commands and he 
r D u^ofd, k d ri ve TO otro„e f s(andd, k drives)mthe 

2xt2Si£ ^dS^S K „ BRIEF DESCRIPTION OF THE DRAWINGS 

drives can operate concurrently at the sustained transfer rate ^ ^ ^ feaWres of , n6 architecture will now be 

indefinitely without the formation of a bottleneck. When he fa ^ ^ refcrencc l0 thc drawmg s of 
packet-switched bus is not being used by all of the disKarivc ^ pre f crre d embodiment, in which: 

controllers (i.e., one or more disk drive centre filers has no a ^ d;sk afray architecture . 

packets to ^^^^^""^Si 20 FIG. 2 illustrates a disk array system in accordance with 

disk dnve controller to use more than th Q g^^ ^ Qf ^ pres6nt 

ZnJ^fb?SSte.' Ulster 1/0 data a, rate higher P p IG . 3 uiuslrates the general flowtf information between 

Z£^tiS«™»^ en,tere, ^ ue,,edI/0d,,,, the primary components of the FIG. 2 system. 

rSs in the S^k drive's cache. . 2 s FIG. 4 illustrates the types of information included within 

The disk drive controllers process their respective lh6 controller commands, 

sequences of controller commands a synchronously to one pjQ 5 illustrates a format used for the transmission ot 

another; thus, the dUk drive controllers that are invoked by s 

a given I/O request can finish processing the I/O nQ fi iUustrat6S the architecture of the system in further 

any order. When a given disk drive controUer finishes 30 • 

processing an I/O request, the controUer sends > a special Q flow & which illustrates a round robm 

completion packet to the coprocessor. This completion j ^ h ^ used , 0 conlrol access to the 

packet contains the completion value that was aligned to ^^^^J n0 . 2 . 

the disk drive controller, and contains an identifier (token) pacto ^ completion logic M of F1G . 6 ia 

the I/O request. " E1 * 

to the final completion value. If a match occurs, indicating „ of mG . 9 . 

One aspect of the invention is thus a d,sk array controller ^ ^ W RAW V ^ nQ ^ lhe archi . 

which operative* connects a ^Y^^SS w S , controller card 30 ("array 

disk drives. The disk array controller comprises a plurali y les ar / array 0 f SCSI (Small Computer 
of disk drive controllers, each of which connects to and is 50 ^JJ^J^ffidrives 32 to a host computer (PC) 

configuredtocontrolatleastonediskdriveoflhearrayje Systems Interface) mtQ a pcl (Per i pheral 

disk array controller further comprises a ^roconUolk gj^^^c,) expansion slot of the host com- 

which is responsive to I/O request generated by the host C°m^enUn^co , t ^ ^ ^ ^ ^ 

computer. The microcontroller dispatches controller com- puter 34 ana ' »«™» a fc 42 For urposes 0 f 

Stotei*^™,^™^^^ « f h y u e de3n and ' ,b! description of J preferred 

transfers of I/O data between the disk dnves am 1 lhe host his d escnpt on ana ^ ^ ^ ^ 

computer. At least some of the control ler commands inch.de ff^^. 1 PenUum™ or other X86-compatible 

system memory addresses for performing the transfers^ ,s ° • ^ P nt.um ^ fe ^ 

«¥r con s^;iuS^s:c= « °rei~windows™ 95 or , he m 

«? *" " ^L S =., ler 30 includes a PCI-to-PCI bridge 44 

automated processors transfers the I/O data between at least The pa bus 42 to a local PCI bus 46 of 

the disk drive controllers and the system memory in which couple * host PL. bus masler with respect 

response .0 transfer commands and targe, system memory the ^0 and £tad»c mQre ^ ^ 

addresses received from the disk dnve controllers. 65 0 toll busse ^,4 tQ ^ ^ pa bus 

Another aspect of the invention is a method of process mg OJ^* 0 ^ ^ 50 conlrols the operation of two or 
an I/O request from a host computer with a disK array 
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5 . i „ . , p „ „ent WO transfers that are performed as part of each 1/0 

more SCSI disk drives 32 via a respective shared I cable .52. nen J rformance miC rocoDtroUer generally must 

^•?=£!r^^ ^Asdl^dbe.ow^archit = 

58 will typically include appropriate exclus.ve-OR (XOR) 5 monu g (ask rf ^ w ^ and b 

logic 60 for performing the XOR operations associated with device^a, ^ ^ ^ ^ , /0 

To^rr^p^r * - £rt"^!^«J? -2 

rriro^ 

PCI-to-PCI bridge 44, and the local PCI bus 46. j^ c j\''5' ^ n er problem, in a. least some RAID 

request typically consist of a ^^Ts^, drive implementations, is that the microcontroller 56 interrupts the 

(CDB) and a scatter-galher list. The CDB is a SCSI drive > p 3g te ^ dunns , he processing of a 

command that specifies such parameters as lhe d.sk opera host p ft ^ common for the 

tion to be performed (e.g. read or write d ™ 15 56 to interrupt the host processor 38 a, least 

block address, and a transfer length. The matter-gather tens nu contiguous block of system memory refer- 

an address list of one of more contiguous blocks of system oncewr ^ ^ Bccause , herc is signlficaol 

memory for performing the I/O operation. overhead associated with the processing of an interrupt, the 

The microcontroller 56 runs a firmware program which °* e ™° fe significantly detracts from the 

translates these I/O requests ^.^P-'^S 2 ° ^"^bandwidth that is available for handling other 

s^sa'^ss^S birpro 1 ^ t^r^rs 

b ; the system, a given 1/0 request requires data to* read 25 pe ^0 in many raid architectures, is that 

from eve'ry SCSI drive 32 of the W^™^*** ^J, contr o,.er 30 generates an interrupt reques 
56sendsSCSIcommandstoeachoftheSCSI^ > 3 the afray ^iier suspends 

The SCSI controllers in-mrn arbitrate fo^ J ^ generaling we following 

PCI bus 46 to transfer I/O data between the SCS date 32 opera ^ ng st has 

and system memory 40. I/O data that is bemg transf^ed 30 £*™P^ ^ , tial ^eck in the flow 

from system memory 40 to the disk drives 32 is initially ^n semceo ^ ^ ^ 

stored in the buffer 58. The buffer 58 is a ^ typically ^ of W data and mc^ ^ ^ B 
to perform XOR operations, rebuild operations (in re ponse hat ne to provide an archlt6C . 

particular RAID configuration. The microcontroller 56 also 35 wre: m , B 

Ld«U.p«oortofU»di^^co«^ JSJI^ device driver can process multiple 

and interrupts the host processor 38 to notify the device pra * ^ ^ ^ ^ processor 6veDtua ll y 

driver of completed transfer operations. services an interrupt request. 

The FIG. 1 architecture suffers from several deficiencies * £ ides a high performance d,sk 

lhat are addressed by the present invention. One such 40 which presses these and other problems 

deficiency is that the SCSI drives 32 « £ wTh%S n XJd systems. An important aspect of .he 

comparison to ATA (AT Attachment) dnves. Whde U is wjnjpn > performance benefits provided 

possible to replace the SCSI drives with ^ bilecture L no. tied to a particular type of disk 

drives (see, for example, U.S. Pa.. No. 5,506,977), tte use by m ^ b imp i emc „ted 

of ATA drives would generally result n a decrease in an fcrred em bod.ment described 

performance. One reason for ^^^^^^ be 0 l) and other types of relatively low-cost dnves while 

LtATAcMves do not buffer mul^ priding a high level of performance. 

eachATAdrive would normally remain inactive whde a new prov. g t 

command is being retrieved from the microcontroUer » U ^ k , em which emb odies the various features 

One goal of the present invention is thus to provide an 50 Ad.sk anay y ^ ^ refct _ 

Architecture in which ATA and other lowest dnves can be of the present inven Thr oughou. this 

used while maintaining a high level of performance. description reference will be made to various 

Another problem with the FIG 1 architecture ,s ^ it the des nP^"- fic ^ jnc , uding> for examp i e , par , 

l0C al PCI bus and the shared cables 52 are suscept ble Mo icemen. v parameters, message 
being dominated by a single diskdnye 32. Such dominance 55 ^^ d ^ rfdatapaUB . 1 i iese deUih.«epH^ed 

can result in increased transactional latency and a corre orm forth ^ erred embodiment of a, 6 

sponding degradation io performance. A rela ed prob em s in ord r y ^ ^ ^ ^ Qf ^ . ^ 

that the local PCI bus 46 is used both for the transfe. ol in ^ forth . n ^ appendcd c aims . 

commands and the transfer of I/O data; increased command scope ot ^ co an 

traffic on the bus 46 can therefore adversely _afec t the 60 As depcte cor , lro Uer") that plugs into 

throughput and latency of data ^\^^Zi a PCl'o of the host computer 34. The array conUo.ler 70 

the architecture of the preferred embodi ^ , he bosl ^mputer to an array of ATA disk dnves 72 

these and other problems by using separate J^ bered X _ N in FIG . 2), with each drive connected to the 

busses, and by using a round-robin arbitration protocol to (° umB6 conlroUer b , respeclive ATA cable 76. In one 

grant the local data bus to individual < inves implementation, the array controller 70 includes eight ATA 
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7 ... . l. FPr , A A,, anoUcalion-spedac integrated circuit (ASIC) or 

use of a separate port per drive 72 enab es . .be drives * be FPGA- M apph^ P^,^ be used. The genera 

:C thTmany of the architectural features of the status of I/O requests. W.tb respect to the PCI 

Znit™ can be usld to increase the performance of disk b ^J 2 of the host computer 34jhe array coproassor 80 acts 

™ svstems that ule other types of drives, including SCSI J» pc , initiator (a tvpe 0 f PC 1 bus master) wh.ch initiates 

Hrivef I. wM ab Tbe recognized that the disclosed array as a ^ read J w ' rile operations based on commands 

conTroU 70 can be adaptedfor use with other types of disk « froffl ^ amomaled ^rollers 84^Th< ^operation of 

r V eT(including CD-ROM and DVD drives) and mass coprocessor 80 is further described be °w. 

s,o«L devices (including FLASH and other sohd state the ^ ^ ^ preferably eilher , x megabyte (MB) o 

memory drives). . ATArlr!v « 72 4 MB volatile, random access memory Synchronous 

in the preferred embodiment, the array °£ A ™ d ™ s D 7 J DRAM or synchronous SRAM may be used for h* pur- 
is operated as a RAID array using, for a r ^°* 20 " ah da u that is written from the host computer ^34 to 

or a RAID 5 configuration. Tbe array controUer 70 can 20 P^~ ^ initially written to this buffer 94. In addition, 

ternarively be configured through firmware to operate J ^^/sO uses this buffer 94 for volume 

drives using a non-RAID implementation, such as a JBOD the arrj ^ ^ ^ ^ ^ ^ § ^ of gQes b d > 

(Just a Bunch of Disks) configuration r * gener ation. Although the buffer 94 is external to 

With further reference to FIG. 2, the array «ntrol^70 and p yg ^ fa ^ prefcrred m> „ may 

includes an automated array coprocessor 80, ibmom aUe rnatively be integrated into the same chip. 

roHer 82, and an array of automated ^^"^ l£ ^controller 82 used in the preferred embodiment 

ATA drive 72), all of which are interconnected by a local I ne m microcontroller 82 is controlled by a 

control bus 86 that is used to transfer command and other » i^"^,, m (slor ed in me ROM 96) that 

control information. (As used herein, the term "autom t d ^T^^JJ, RAID or non-RAID storage proto- 

refers to a data processing urn. which °P« at « * 30 primary function performed by the m.croconUoll r 

fetching and executing sequences of ^JP^^J £ 0 lran P slale I/O requests from the host computer 34 into 

The aufomated controllers 84 are also connected to the array *to «r commands and o 

coprocessor 80 by a packet-switched bus 90. As farther seque ^ over ^ j , conUo , bus 86 to 

depicted in FIG. 2, the array coprocessor 80 » locally d»£t* controllers 84 for processing. As 

connected to a buffer 94 and the mioo^jote 82 ^ 35 sp^tc ^ ^ such ,„ 

locally connected to a read-only memory (ROM) 96 and a ^ ^ to directly momtor tne y 0 transfers 

random-access memory (RAM) 98 ™ from , h6 dis atcn6d controller commands, as this 

The packet-switched bus 90 handles all I/O data transfers out coprocessor 80 (using an 

between the automated controllers 84 and the array copro- task is all which , s d bd 

cessor 80. All transfers on the packet-switched bus 90 flow 40 eEaent^ P t 0 f the architecture enables a relatively 

Xto or from the array coprocessor 80 •» ~» f 0 tlosu"w performance microcontroller to be used, and 

to the packet-switched bus are controlled by the array 10 £ texity o{ the control program, 

coprocessor. These aspects of the bus architecture Provide reduces th p y ^ ^ a 

for a high degree of data flow performance without the » S embodiment, the microcontroller could alter- 

complexity typically associated with PCI and other peer-to- 45 th« preyed ^ ^ ^ ^ m anay 

neer type bus architectures. «mrocessor 80. This could be done, for example, by pur- 

*a7 described below, the packet-switched bus 90 uses <W™£L* 163 core (or tbe core of a comparable 

packet-based round robin protocol that guarantee ^ hat a, « ^ ^ cQre ^ an as IC 

least 1/N of the bus's I/O bandwidth will be available to eacb mi / coprocessor logic, 

driveduringeachro^^^^ 50 thatinclude m y^ ^ fo 

course of etch I/O transfer). Because this amoun (1/N) of J^^kbin response to drive failures, and for han- 
bandwidth is equal to or exceeds the sustained data transfer ^ouim ^ ^ narlICU , ar settings 

rate of each ATA drive 72 (which is typically in thi » range of dtag f&buU options , etc.) implemented by 
0 Mbytes/sec), all N drives can operate concurrently at the * af6 slored within a profile table (not 

N drives are using the packet-switched bus eacfc dnw is ~ alteraa tively be used. The automated control- 

allocated more than 1/N of the bus's bandwid* , aUow ng ASICs ^ y communicatmg with their 

each drive to transfer data at a rate which exceeds the ^ based on commands (referred to hereu. 

sustained data transfer rate (such as when the requested I/O commands „ ) received from the m.crocontro^ 

data resides in the disk drive's cache). . , er 82> ^ b y communicating with the array coprocessor 8U 

lD ,he preferred embodiment -he ^^[^ Ser pacL-switched bus to transfer I/O data. As d,s- 
implemented using an FPGA, sucti as a aiuu 
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cussed be,ow, the automated controllers 84 implement a = ^ ^f^t^^ 

command buffer to avoid the latency normally as»a ted conuoUen — P P^ ^ snd 

,.n.K h.vino to reouest and wait for the next disk command. eralty aepena ^ _., u . J,^ ^„, r .„„ hftr , ist . 
' •^iurSer depicted by FIG. 2 1 the system incites a 

device driver 100 which is executed by the host proctor 38 5 J^^^^^. queues 10 8 within the RAM 

to enable the operating system to communicate ^ toe g^^ta^ ^ Dlrol ? er commaQ ds in sequential 

array controller 70. In the preferred embodiment, the device w. ana cusp corresponding 

driver 100 is implemented 'S^SSX*. For example, if the I/O request 

runs under the Microsoft Windows 95 or Nl °P« ra »°8 drf , 2 and 8 controller command sequences 

system. The driver 100 presents the drive array to ^the host 10 nwokes dnves 1 ^ ^ ^ ^ % 

computer 34 as a SCSI device, which in-turn en bl s be will be written o commaads with , n6re . 

array controller 70 to queue up and process multiple I/O and 8, ana m ^ w iuftj _ 

below, the array coprocessor 80 updates this table 102 (in by tne a microco otroller 82 when all of the 

response to special completion packe* received ta he 20 ^d'J^,, control , 6rs 84 nave fioi shed processing 
automated controllers 84) to noUfy the drive, 100 of the ;™ s " al ^ mmand xqwa(xs . This eliminates the 
C Z P c\ M&SSSl ™C of information between need for the microcontroller 82 to monitor the processing of 
the^P^S^ 

operation, and will be used to describe the genera operat on 25 ^g^^, addr6SS , and lrans fer information, 
of the system (including a technique for momtormg he ^^ m ™^ ock a disk oper ation, such as a 

completion status of pending I/O requests). To amplify the The c ™ nd ^ K P Qr ^ u , address references a 
drawing, the disk drives 72 and buffer 94 are omitted from ^^™£j^^ manty 40*tetotta 
tof&T*,rttew™*™^M™^»£ TmG Tto^onning ^ I/O transfer. The transfer 
single entity. Throughout the description which follows, it is 30 S >4 _(tiu_ f> P & transfcr ation> 

assumed that the number of drives N is 8. In addition he ^^^^^ ^ invo lve an exclusive-OR 
operation of the system is desc^ of data stored in the buffer 94 (FIG. 2). 

request is being processed, although multiple I/O requests oa ^ ^ fa ^ 4 ^ tet 

will typically be processed concurrently. controller command of each sequence additionally includes 

In operation, when the device driver 100 receives an I/O 35 ^ on ™ C °2Hhat was assigned to the I/O request, a 
requesi from the operating system (not shown), the devu* ^^^^JXj,,,, vahie ("disk completion value"), 
dnver assigns to the I/O request an identification number ^^^ | ^* n ^ raiS i £the8ta n 1 s,*lel(B(FlG. 
referred to as a completion token ("token"). In the preferred and th system mem ry ^ be transferr ed to the 

embodiment, the tokens are 4-bit values that are recycled %J^J£ ^ ^ er as a y separa te controller command, llie 
(reused) as I/O requests are completed. As dep.c ed mHG 40 automated ^"e P ^ 

3.*edevioedrrverl00p^.^W^(»tog»e2 ^^Xin of theU completion v.l»««rigned 
form of a CDB plus a scatter-ga her bat) end he token £ . he U sue ^ ^ ^ ^ fa ^ 

microcontroller 82 for processing, In .addition, he device to me i/u q (FFH fa (he prefcrred 

driver 100 records the token in the I/O > reques stau* table ected I final comp^ ^ ^ coproce r 80. For 
102 to maintain a record of the pending I/O request. This cmwo ^ ihen ^ mowing 

may be accomplished, for examp e, by setting appropriate example^ . *^ caQ b6 ^ lQ duce a flnal value 
status flags associated with the token value. ofFFH 
Because the array controller 70 can process multiple I/O ot H-H. 
requests at-a-time, multiple I/O requests may be recorded Drive ,. 01H (000 ooooiB) 

within the status table 102 at any given time As described 50 (000 oooiob) 
below, the array coprocessor 80 automatically updates the 2- om (ooooooiob) 

status table 102 whenever an I/O request is completed, and Mve g . rcH (mmooB) 

the device driver 100 monitors the status table 102 to .detect autor nated controllers 84 transmit 

the completion of the pending I/O requests "the prefer ed * aesc ^ completion values to the array 

embodiment, the I/O requests ma, ,be ^completed by th array 55 ^ Jenand P^^ g4 finish their 

controller 70 in an order that is different from the order _in «°P r ° f he , /0 sl (Le . f flnfeh processing 

which the I/O requests are passed to the array contoUer W. ^^^^^ ^and sequences), and the 
As further illustrated by FIG. 3, the microcontroller 82 their «P£^ cumu i a ,ively ORs the disk completion 
records the I/O request and the toker , within a ^nd.ng I/O anay c rece f ved , Q delect tne 

request" table 106 within its local RAM 98. In adduion the 60 ^ method enables lhe array coproces- 

microcontroUer 82 translates the I/O request into one or of the WJwquc ^pfcto, 0 f an I/O request 

more drive-specific sequences of commands, referred to ^^3^ ^ f the pr0C6SS ing details (number 
herein as "controller commands." For exampk, if, based I on without p no kne , & identities of invoked disk drives, 
the particular RAID configuration (e.g., RAID 5) imple- o f <ta * dr ^ s ,a ™' 

mented by the control program, the I/O request oU. > for dat <5 etc^ of IheJO equest^ ^ ^ ^ ^ ^ 
to be read from or written to dnves 1, 2 and 18, he wu Uef by communicating 

microcontroller will generate three sequences of controller lers 84 proces. 



03/16/2004, EAST Version: 1.4.1 



6,138,176 



11 



with their respective disk drives 72 (FIG. 2), and by sending 
packets to the array coprocessor 80 over the packet-switched 
hns 90. In the example above (drives 1, 2 and 8 invoked), the 
I/O request would thus result in packets flowing trom 
automated controllers 1, 2 and 8 to the array coprocessor 80. 
Each controller command spawns the transmission of a 
sequence of packets (e.g., 16 packets) from the correspond- 
ing automated controller 84. (As used herein, the term 
"packet" refers generally to a block of binary data that 
includes address and control information.) i° 

As illustrated in FIG. 5, each packet includes a transfer 
command, a target address, and an optional payload 
(depending upon the type of the packet and the availability 
of I/O data). The transfer command specifies an operation to 
be performed by the array coprocessor 80. For example a 
packet might include a READ PCI transfer command that 
instructs the array coprocessor 80 to copy a block of data 
from a specified system memory address and to a specified 
buffer address 94. For all packets other than completion 
packets (discussed below), the transfer command is derived 
by the automated controller 84 from the transfer information 
(PIG 4) included within the controller command. The target 
address specifies a target location, in either the buffer 94 
(FIG. 2) or the system memory 40 (FIG. 2), to which data 
is to be transferred or from which data is to be read. 

The transfer commands that are supported by the system 
are listed and summarized in Table 1. Asi"^^ ™? 
1 if the transfer command is WRITE BUFFER, XOR 
BUFFER or WRITE PCI, the payload includes disk data that 
has been read from the corresponding disk drive. In the 
example flow shown in FIG. 3, the I/O data is depicted as 
flowing from the array coprocessor 80 to system memory 40, 
as would be the case when a WRITE PCI command is 
executed. 

If on the other hand, the transfer command is READ 
BUFFER, the automated controller 84 transmits the com- 
mand and the target address, and the array coprocessor 80 
then "fills in" the payload portion with the buffer data to be 
transferred to the disk drive. Thus, although all packets 
logically flow from the automated controllers 84 to the array 
coprocessor 80, the packet-switched 90 bus is actually a 
bi-directional bus that transfers I/O data in both directions 
(i e from the automated controllers 84 to the array copro- 
cessor 80 and vice versa). The timing associated with packet 
transfers is discussed separately below. 

TABLE 1 
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TABLE 1 -continued 



TRANSFER TARGET 




COMMAND ADDRESS 


DESCRIPTION 




and token to update status table. 




No payload. 



25 



30 



35 



TRANSFER TARGEl" 
COMMAND ADDRESS 



DESCRIPTION 



READ 
BUFFER 

WRITE 
BUFFER 
XOR 
BUFFER 

WRITE 
PCI 

READ 
PCI 

WRITE 
PCI 

COMPLETE 



Buffer Read data from bufferr and transfer to 

Address automated controller. Payload - 8 

Dwords of buffer data. 
Buffer Write disk data to buffer. Payload - 8 

Address Dwords of data read from disk. 

Buffer Exclusive OR buffer data with payload 

Address data and overwrite in buffer. Payload 

» 8 Dwords of data read from disk 
p CI write payload data to system memory. 

Address Payload - 8 Dwords of data read 

from disk. 

Buffer Read data from system memory and 

Address write to buffer. Payload - PCI address 

for performing read. 
PCI Address of Update internally-stored completion 
of Status Table table using token and disk completion 

value included within command field. 

If I/O request is complete, send token 

to microcontroller, and use PCI address 



As shown in Table 1, packets that carry I/O data have a 
payload length of 8 doublewords (Dwords), where one 
doubleword-32 bits. Thus, 16 packets are needed to move 
one sector (512 bytes) of I/O data. 

In general, the drives invoked by an I/O request process 
their respective portions (transfers) of the request asynchro- 
nously to one another, and can finish their respective por- 
tions in any order. In addition, once a dnve/automated 
controller pair finishes processing the I/O request, the pair 
can immediately begin processing the next I/O request, even 
though other drives may still be working on the current I/O 
request. 

Whenever an automated controller 84 finishes processing 
the last controller command of a sequence of controller 
commands— indicating that the automated controller has 
finished its respective portion of the I/O request-4he auto- 
mated controller generates a special P^ ke K re ^,i° * s * 
"completion packet") which includes the WRITE PCI 
COMPLETE command (Table 1). An I/O request can pro- 
duce as few as one completion packet (if only one drive is 
invoked) and as many as eight completion packets (if all 
eight drives are invoked), and the completion packets can 
arrive at the array coprocessor 80 in any order. Each 
completion packet includes the token, the disk completion 
value, and the status table (PCI) address that are appended 
to the last controller command (FIG. 4) of the sequence. The 
token and disk completion value are included within the 
packet's command field, and the status table address is 
included within the address field. 

As the completion packets associated with the I/O request 
(token) are received, the array coprocessor 80 cumulatively 
ORs the completion values together to determine whether 
any other disk drives are still working on the I/O request. 
The logic circuit used to perform this task is shown in FIO 
8 and is discussed separately below. With the exception of 
the last completion packet of an I/O request, the array 
; coprocessor 80 does not take any external action in response 
to receiving the completion packets. 

As further illustrated by FIG. 3, once the result of the 
cumulative OR operation equals the final completion value 
(indicating that the last completion packet has been 
3 received, and that all drives have finished processing the I/O 
request), the array coprocessor 80 performs two basic tasks. 
The first task is to interrupt the microcontroller 82 and 
transmit the token (over the local control bus 86) to the 
microcontroller 82. The microcontroller 82 responds to the 
interrupt by removing the I/O request from the "pending I/O 
request" table 106 to reflect that the request has been 
completed. In general, if a pending I/O request is not 
removed from the table 106 within a certain timeout period, 
the microcontroller 82 invokes an error processing routine to 
process the timeout error. . 

The second task performed by the array coprocessor 80 is 
to update a status entry in the status table 102 to indicate to 
the device driver 100 that processing of the I/O request is 
complete, and then set an interrupt flag (if not already set) 
to the host processor 38 to generate an interrupt request. The 
update to the status table 102 may be made, for example, by 
using the PCI address (included within the completion 



55 
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Lng /he token value as an offce. ^«^Jg~* eSB£E comroller have finished passing .he I/O 

;« Pir, ^ a mmnlelion flag associated with the toKen ^i/u WUUUU V u _w ,h* I/O renuests that 

requesi)~may then be set. Because only the '°j 5 iJ :o ?i , '*'^° 5 ^ack'edTsi^ no, 

pVcke, produces an update to the sums table 102, he status 5 identically to the I/O requests generated by the 

[able address may alternatively be omitted from all but one J^mto example, ^device driver could be 

of«teoompkrionp^foctheW)^m^^*c« Sed to comSne muHip.e I/O request together for 
the array coprocessor 80 may be configured to buffer the ^onngu above described method could be used 

address (in association with the corresponding token) unul U ^ P«££^ complelion of these combined I/O requests. 

* iTanler embodiment of the invention, the completion 

packets include a payload that carries a pointer that i « The PJW™ ^ reference 

meaningful to device driver 100, and the array "P™"" ^ ^ the array coprocessor 80, the 

80 writes this pointer to the status table 102 when the las ^ ^ ' ^ au J lomated controUer 84. 

completion packet is received. The pointer is P«^»»*r a is 5^^^ Q ~' ^^.ton-AC'isusedtotefcrto 
value which identifies the ^^X^rZl nJSKcona^ and subscripts are used to denote 
100 or the operating system. For exampl M fa « lg« ^ correspondence with drives 1-8. 

be an identifier or system memory address of aSCSl request corresp ^ in , erconnecl 

block (SRB) or an I/O request packet (IRP . The advant ge * 80 to the automated controllers 84 to 

of this alternative implementation is that it eliminates the 20 the arra ^ ,„ (nG 2) include a bus 
need for the device driver 100 to use a separate lookup able form the ^packe^ swuc ^ ^ 

t0 match the token number to the pending W^/* ^.T^ft-specific request (REQ) and grant 
with the tokens, the pointer values are preferably passed I to nd a sen s ot dri p 4 , ine 120 t0 all 

the microcontroller 82 by the device dr.ver 100 (with the /O VW^MU*^ ^ ^ s . gnal 

requests) and embedded within the last controUer command 25 ° f h ^^~X lransfers on lhe pack ei-swiiched bus. 
ofeach drive-specific sequence. Ttepomle. jalues may Mso Embodiment, lh e bus clock is a 33 MHz 

serve as the tokens themselves, in winch case separate token « of packet data occur a t a rate of 32 bits 

values may be omitted. * d'oubleword) per clock cycle. In other embodiments, a 

While the interrupt request to the host processor 38 is (on d ™« ew ° 3 ^ , Q accomrnod ate faster 

pending, the array controller 70 »J££^ " ^numbers o/disk drives, 

ing I/O requests, and continues to update the star* table 102 ^ g ^ ^ ^.^ aU packe( d lhal „ 

as additional I/O requests are completed. When the tost " ^ h packet-switched bus. All packet trans- 

processor 38 eventually processes the «f™P'«^£ 1™ tWs 32-bit bus 90A occur between the array copro- 
device driver 100 accesses the status table 102 to ^determme toon this32 b ^ ^ ^ 

which of the pending I/O requests have been completed 35 « information flowing in one direction 

When the device driver 100 determines that a given I/O ddre^ .«* UonUol m ^ ^ |0ce8gor 

request has been completed, the device driver notifies the ; fa both direclions . 

operating system of such, and removes the IO request from 80) and vtovoa ^^fa is connec ted to the array 
the status table 102. This feature of the architecture (. e the Each^itoma line l24 (labe l e d 

ability to process multiple I/O requests per interrupt s.g- 40 coprocessor w y P ^ ^ m 

nificantly improves the performance of the host computer 34 REQ, ^ lines ^ 

by reducing the frequency a. which the host processor 38 is ^^^J the rount f rob in arbitration protocol, 
interrupted. To take advantage of this feature, *e device K m are used by the 

driver 100 is preferably configured to make « o deferred ^^i^^ 84 t0 request timeslots on 
procedure calls to defer the processing of the »»<™P* 45 ^ a Xt-switched bus 90, and the grant lines 126 are used 
As will be apparent from the foregoing, an uoportsn ^ ^^STtte individual automated controllers 84. 
benefit of the present architecture is that the ^controller to grant th bus to the coproce ssor 
82 does not have to monitor the constituent disk operauons JJ to ^"1^1^ frames of packets on the packet-switched 
of the I/O request to ensure that each 7««fi%. " to / ^ ted Cementation of the arbitration protocol 

A related benefit, which is described further below rs that 50 bus^A P^'« be , ow ^ referenC6 t0 FIG. 7. 
the array coprocessor 80 does not require logic for co re- is discussed «^ ^ each amomated ^Vcr 

Uting the constituent disk operations to the pend.ng I/O f o ^\X micro L nlroller 82 by a respective ready 
requests. Both of these features are enabled in-part by the W IconnecK to tn &ch read Hne 130 

u^ of tokens and completion values to track the completion ^'"^^n.! tha , ^ ^ the respective auto- 

of I/O requests. , • matpfl rnn trnller 84 to request new controller commands 

Another benefit of the architecture is that the m.crocon- ^^^^ gl ^ descr ibed below, the auto- 

troller 82 is effectively removed from the I/O data path. Tta ^^^S double the buffer controller commands, 

reduces the complexity of the control P^"^ 6 fl Tnext controller command (if available) will be 

a less expensive microcontroller to be used. Anomer^nefit sc . mat tn aulomated comro Uer 84 when the 

is that the flow of command information to the automated 60 queued up wjhm I ^ ^ depjcte(J ^ 

controllers 84 does not interfere with the flow of I/O data, current c^rou ^ ^ ^^^^ ^ ^ 

since separate busses are used for the two. ■ • controller) input of the Siemens 163 

It will be appreciated that the above-described method for ^ L ^ en ^ g2 The use of PECs provides a mechanism 

monitoring the completion of I/O requests «n aUc , be used m.c c^onUoUer E P ^ 

invoked bythe I/O request would still be assigned a unique automated controllers 84. 
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buTare TolleCivCy denoted by reference number 86A u> ~,cat^ ^ fe gen erany dictated by the 

, , . ii .v.< , .r-orl M/hif*n in inc Div* 

fK£*«™ .1 G»m Ol»»*» °>«~ , SSSS^^^^"*" " ,0 i 

VaXotS Packets received from the automated ^ th ; 6 packel .switched bus to the corresponding 

1^6 according to the transfer commands set forth m Table 1 below ^s^rs. 

Sve. A F.F 6 0 memory (no, shown) , tnch, e J » lecture and general operation of Automated Con- 

^JSWSSS. ^ " \ k , further reference to FIG. «, each automated control- 

V , £n r a , Lh pfeket received by the automated packe WJh tato ret ^ ^ ^ ^ ^ d 

nrocJLr 136 is a self^ontained entity which fully spectfies er »™ , circuit 176 . The signal lines 

°o^ion (including any target address) tc £ , p-tag 25 ^c°mm ^ ^ ,U corresponding 

DV the array coprocessor 80. For example when a packet wn ^ daia ^ „ g d a t of A ta 

conta ning a WRITE PCI transfer command is .received, he ^nv* „ ,„ of which form part of a standard ATA 

a^ay copro^sor simply writes the payload data Jo the con olhne ^ ^ m connected to an 

ureet PCI address specified within the packet without ^ternall6-bit data bus 182 for communicating with an ATA 
Srd ^ either the source (disk drive) of the Wj»"«U 30 ^^^^^XM^comr^m^ 

or the I/O request to which the data corresponds. In this dn ^ 80 . As illustrated in FIG. 6, the 

respect the arcay coprocessor 80 acts essentially as a slate- the arcay^ p ^ ^ , d 

STSS-Jcoi transfer buffflSO for storing controller commands that have been 
mated controllers 84 (the "dien*r ™ th ° u < * A 35 received from the microcontroller 82. 
know the details of the underlying I/O requests. An impor rec tQ temporanly sl0 re I/O data 

2 b nefit of this feature is that the logic circuitry of the The read M ^ ^ disk drive „ the y 

a,?av ^processor 80 is significantly less complex than ^ * MAepicted in FIG. 6, data is written into the 

would ^possible if, for example, the array coprocessor had coproc^, » ^ ]S d , 

^tchVe^^^ -dF.F ^ ^ ^ ^ * 6 

automated packet processor 13. a,o include a P^^^^S^ packet 

^-r^--5^«Sr2 P totat i on,dataiswn,ten,ntoth 6 readF,FOat,hed,k 

contirSaces 138, 140 when the tat compter ^^(Tte sustained transfer rates for these drives are 

packet of an I/O request is received. Assertion of ttos ^^^^ less because of seek times. Data is 

interlrsignal causes the microcontroller interface 140 1 to m (during alloca ,ed timeslots) and 

inte St the microcontroller 82, and causes the PCI urter- read from ^ ^ ^ ^ 33 bytes/cycle= 

38 The completion logic circuit 144 is described I * fate ^ J ^ ^ acC6leralor> stonng yo da a 

r^:*»«: » ™ * - - -* - » - 

nLface 138 asserts a PCI request line (not shown) 1 ^o disKm f doub , ebword at . a . um6 (at he 132 

the mailbox 150 to microcontroller interface 140 I/O o g^ read ^ Q ^ FlF0 toHk 16 

requests written to the mailbox are passed to the microcon doublewords (two packeK ) of I/O data, 
troller 82 for processing. 
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Tne transfer/command contro, 176 circuit includes logic circuit 176 £1. ^J^SS^iffSS^ 

for performing the following of^dEEffS^E of packel is^ransferred, the 

ler commands from the microcontroller uhu — ^ ^m^i Jt 176 increments an internal 

mand buffer 180, so that the command buffer contains the ^^^^^^f^^ number of bytes that have 

next controller command (if available) when processing ot 5 ^ ^askmd, and uses the counter value to generate 

the current controller command is completed, (u) processing riale Urgel addresses to insert within the headers 

controller commands received from the microcontroller 82 (JIG. 5) of the packets. 

to generate transfer commands to pass to the disk drive 72, ^ trans f er / comrM nd control circuit 176 determines 

(iii) implementing the "host" side of the ATA protocol to wn6ther l0 the request line 124 either by monitoring 

communicate with the ATA drive 72, (iv) generating the 10 , he stateo f the read FIFO 170 (if the disk or^ration is a disk 

headers (address and command fields) of packets to be read ) or by monitoring the state of the write FIFO 172 (it trie 

transmitted on the packet-switched bus 90, and gating the disk opera t ion is a disk write). Specifically, for dek read 

header data onto the data bus 90A; (v) controlling the flow oper aiions, the transfer/command contro circuit 176 asserts 

of data into and out of the read and write FIFOs 170 and 172, u£ request line 124 whenever the read FIFO 170 contains at 

and tvi LeneS Request (REQ) signals and monitoring is l6as , one packet (8 doublewords) of I/O data; and for disk 

In (gS Sis o imp ement the "client" side of the write operations, the transfer/command control circuit _176 

f^lSroS Ttie logic circuitry used to implement asserts the request line 124 whenever the :wnte FIFO mhas 

SrSons ?s discussed below under the heading sufficient room to receive at leas. ^one packet tfW*MAs 

SnSFeSoMMAND CONTROL CIRCUIT. ff-2^f^^K)^^W 

the command buffer 180 is empty. Assertion of the RDY line b ~ automated controller 84 asserts its request 

130causes,he microcontroller 82 ,c ,u*ue ^.contro J^^oSSSuolbr win be granted a timeslo, 

command to the automated controller 84 from the corre une i*. fixe ^ 

sponding queue 108 (FIG. 3). If no controller command is 25 in wh ^ P^J^ of lhe bus design is . resul t 

currently in the queue, the microcontroller issues the con- mum aun ^peno^ (in* ^ 

troller command when it becomes available (such as when ^"'g™ approximately equal to the time 

a new I/O request is received from the host compute^ „m ( ™ ^ » er au.omaL'controllers 84 to 

When the microcontroller 82 issues a controller command to "^f^J?^.^ packets . This maximum time 

^^^'^^ft^S^^ ^\™«Z*yM such that (i) on disk read 

circuit 176 stores the command block portion (FIG. 4) ot tne penoo is p > m b comp letely 

controller command in the command buffer 180 and deas- ^^J*^^^™ ertlions 0 f dat a stored in the 

serts the RDY line 130. • . . FIF0 m never prematurely 

When the ATA drive becomes ready, the transfer/ buffer^ 94 the wn e H benefll of lhis feature is that 

command control circuit 176 writes the command block to 35 become emp^An . p ^ ^ ^ ^ 

the drive for processing. The command block includes _the J^* 1 ^™^,^ ? esult insufi 4 nt bandwidth on 

with I/O data. Once the command block is wntten to the disk cop rocea» r M >nc ludes^ an 

drive 72, the command buffer 45 S&T^irSnd robin protocol. The arbitration 

transfer/command control circuit 176 reassert the RDY line 45 c«U>w» ■* » 8 £ d on , he 

130 to request a new controller command. As d = ed ^^^^^J^ 124 from the automated 

below, the target address and other information needed to ^^™ 6 \° n 7base d on transfer status information 

complete the transfer over **V«%™** buS 15 ^ ^e" from the'aStomated packet processor 136. THe 

tained in separate registers 280 (NO. *)• „ lltrtma t„i mntmllers 84 assert their respective request lines 

In.ypicalATAimplementa^^ » ^^^^^ and * T 

ity or "dead period" occurs while the ATA drive fetche toe i£ asy □ J of ^ bus dock 

next disk command from the host computer. Tins dead hues can ^be assenea g , ^ ^ 

period adversely affects the net throughput of the disk drive FIG. J » a How ^.^.^ ^ 

In the preferred embodiment, the architecture of the contro ^itratK)n P«^^™» ^ lhe y aow di fc , disk 

program is such that the next controller - 55 3^^ u X?w^ varie. between 1 .nd 8. As 

available) will be wntten to the command buffer 180 before drive ra e no[)e Qf 

the disk drive 72 finishes processing the current d.sk opera- J^^^^^^^v* the state machine 

tion. Thus, the latency that would normally be associa ed J ^ request (RbO) Un ^ & ^ 

with having to fetch a new controller command from the 1W . . .Joopi £ macbjne 142 

microcontroller 82 is avoided. This feature of the archilec- 60 » ^"^e bus clock 120 to sample an 

ture enables a high degree of performance to be achieved g 8 ," ^ move on to the next request 

using low-cost ATA drives f , ^ when none 0 f the request lines 124 are active, the 

During the processing of the disk operation, the transter/ une. in , ^ 

command control circuit 176 repeatedly asserts j reques ^ate mac^ - ^ ^ slale macnine 

SLitSa 1 :^^ 

disk operation is a sector read, the transfer/command control per clock cycle. 
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fi* nn the same clock cycle) asserts the corresponding me interrupt nag ^ tU ^._, ;t 144 

incliiHf.<i a retrister 

controller 84. On the same clock cycle the array coproces- 5 file 240, an ,8 brt tog* 

sor 80 receives the transfer command (FIG. S) from the W«^ u " nabeled 0-F). Each register 248 corre- 

automated controller 84; and on the followmg clock cycle ^^^^SSl^n and holds the result of the 

the array coprocessor 80 receives the targe, address from the spends to a ^ corresponding 1/0 request, 

automated controller 84 described above, the tokens are assigned to pending I/O 

As depicted by blocks 212 and 218 the state macmne ~V . , he device ^ , ne I/O requests are passed 

then communicates with the automated packet processor 136 ** A , iven , J e> each signed 

(HG. 6) to determine whether or not *£^t «U *** T^ ptds uniquely to" *. different pending ./O 

a payload. ^^^^S^^^ request. Thus,' in the implementation depicted in FIG. 8, up 

command is WRITE PCI CUMKLbi c \P™-* *"-h w I/0 reouests can be peod ng simultaneously, 

the transfer command is READ BUFFER zn6 toluputo is «o ^^^fvalu « are^genemed by the control 

is not ye. available in the buffer 94 (block 216). In either of J^J^^ Znl a lookup table), and are assigned 

these rwo cases, the state machine 142 deasserfc the gran. ^^^^^rSS^ lhc completion values 

line 126 (block 216) to terminate the timeslot, and returns to sujbthauhe °™^™« cqai]s FFH . Fo ? example , for 

the sampling loop. f ,k«t,<- m an I/O reauest that only requires access to one drive, a single 

As represented by block 220, if neither of the above 20 an WUi rcques mat oniy M disk 

conditio^ is met, the state machine 142 continues to asser ^^'^J £ involves all eight disk 

^jr^-icarrj; „ i^osgau ™*k -coco, 

cSoceS 80 to animated controller 84, an extra clock an 1000 = ). fc ^ 

cycle is used as a "dead period" between the header trans- In "'Jafek completion value are extracted from 

mission by the automated controller 84 and the payload th totaa and th |^YT inputs t0 tne comple tion logic 

transmission by the array coprocessor 81). H depicted in FIG. 8, the token is used to 

An important aspect of this arbitration protocol is that 30 cjcmt 14* ; (he ^^nding 

when a disk drive does not use .its tte dmedol s ^^^^ will be rfoo the first pass) to 

effectively relinquished for other drives to use Tnus in f™ 11 ^* . \ file and fed as an input to the OR 

addition to guaranteeing that 1/N ^^b^v**^ ^ ^ Tnt cumulative OR value is then ORed withthe 

width will be available to every drive at all times 0*^8 d ^ comple ion value to generate a new completion value, 

every round robin cycle), the protocol enabU« the drives to 35 disk «™P* n ° * fe back t0 me same 

use more than 1/N of the total bandwidth when one or more J*™™^™" ^ b 

drives are idle. A drive may be able to use this additional ^K,n 248 , n he • completion value of 

bandwidth, for example, if a cache hit occurs on a disk read ^^^1^^ (indicating that the last completion 

allowing thedrive to return the requested daU», a rate which ^ ,f ^J ^Vd), the compare circuit 244 asserts 

is considerably higher than the drive's sustained transfer 40 packers been re^ ^ ^ P ^ signal (nQt shown) 

"'Although the system of the preferred embodiment uses which causes the addressed location in the register file 240 

drive-specific request and grant lines 124, 126 to implement to be nacL M ^ ^ ^ 

me round robm protocol, a vanety of alternative techniques ™™™™ J co H essor t0 80 to detect the 

are possible. For example, the array coprocessor 80 could 45 that . ^ s ^ y ues , P wim0Ul any prior information 

transmit periodic synchronization pulses on , .shared control ^^^^^ » lhe number of drives involved 

line to synchronize the automated controllers 84, and each »™ Mcl is lhal enables the 

automated controller could be preprogrammed via the con- l0 be rapidly posted to the host 

trol program to use of a different timeslot o a frame the <^^™ ^ d less q oftheorder m which the disk drives 

SSUA «— * portions of ,he 1/0 

implement a protocol in which the bus is granted to he ™ B of the system. To simplify the 

automated controller 84 that leas.-recently accessed the 176m ^e^componen^ ^ y^ ^ ^ ^ ^ ^ 

packet-switched bus 90 ' . d h , ^ f g6n6ra ti ng request (REQ) 

'TiT^SSSSSSS- - • Ef "'<° *» 

array coprocessor 80, and illustrates the general flow of ted_ tran sfer/command control 

infoLation that takes pUec > wheneve, compk'ion pack Jdes a Sansfer engine 260 and a command 

is received. As described above, the purpose of the circuit circuit 1/ connected bv a START line 264, a 

144 is to monitor the tokens and disk comp m» . .values engme 262 ^^^ , ^ |1IimiI- bus 272 . ^ 

contained within completion packets o detect the comple- 65 DONE llDe d incs 260 262 includ6 sla te 
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"host" side of the ATA protocol (including Ultra ATA). In 
typical ATA implementations, the host side of the ATA 
orotocol is implemented through firmware. By automating 
the host side of the protocol (i.e., implementing the host side 
purely within hardware), a high degree of performance is 5 
achieved without the need for complex firmware. 

The transfer engine 260 interfaces with the ATA drive 72 
via a set of standard ATA signal lines, including chip selects 
179A, strobes 179B, and an I/O ready line 179C. The 
transfer engine 260 also includes a set of FIFO control lines 10 
276 that are used to control the flow of data into and out of 
the read and write FIFOs 170, 172. 

The command engine 262 connects to the microcontroller 
82 via the ready (RDY) line 130 and the local control bus 
86A, and connects to the array coprocessor 80 via the 32-bit is 
data path 90A of the packet-switched bus. The command 
engine 262 connects to the ATA drive 72 via the 16-bit ATA 
data bus 178 and the ATA drive's interrupt request (IRQ) line 
179D Included within the command engine 262 are the 
command buffer 180 and a set of registers 280. As discussed 20 
below, the registers 280 are used to hold information (target 
addresses, etc.) associated with the controller commands 

The transfer engine 260 supports three types of disk 
transfer operations: a 1-cycle STATUS READ, an 8-cydLc 
COMMAND WRITE, and a 256-cycle DATA TRANSFER. 25 
These operations are initiated by the command engine by 
asserting the START signal line 264 and driving the ; transfer 
command bus 272 with a command code. When a STA1 US 
READ is performed, the transfer engine 260 reads the ATA 
drive's status register (not shown), and routes the status 30 
information to the command engine 262. When a COM- 
MAND WRITE is performed, the transfer engine 260 gates 
the contents of the command buffer 180 onto the drive's data 
bus 178 to copy a command block (FIG. 4) to the drive. 
When a DATA TRANSFER is performed, the transfer 35 
engine 260 transfers one sector of I/O data between the drive 
and either the read FIFO 170 or the write FIFO 172. 

With further reference to FIG. 9, the transfer/command 
control circuit 176 processes controller commands generally 
as follows. Whenever the command buffer 180 is empty, the 40 
command engine 262 asserts the RDY line 130 to request a 
new controller command from the microcontroller 82. When 
the microcontroller 82 returns a controller command, the 
command engine 262 deasserts the RDY line 130 and parses 
the controller command. The command block (FIG. 4) is 45 
written to the command buffer 180, and the remaining 
portions of the controller command (target address, transfer 
information, and any completion information) are written to 
the registers 280. 

At this point, the command engine 262 waits until pro- 50 
cessing of any ongoing disk operation is complete. Once 
processing is complete, the command engine implements the 
sequence shown in FIG. 10 (discussed below) to control the 
operation of the disk drive 72. In addition, if the controller 
command calls for data to be written to the disk drive 72 and 55 
the write FIFO 170 is available, the command engine 262 
begins to generate and send packets on the packet-switched 
bus to initiate the filling of the write FIFO 172. 

FIG 10 illustrates the sequence of transfer operations that 
are initiated by the command engine 262. The command 60 
engine initially requests a STATUS READ operation to 
check the status of the drive. If the result of the STATUS 
READ indicates that firmware intervention will be required 
(not shown in FIG. 10), the command engine 262 reports the 
error to the microcontroller 82, and the microcontroller 65 
enters into an appropriate service routine. If nc > errors are 
reported, the command engine 262 initiates a COMMAND 



WRITE operation to transfer the command block from the 
command buffer 180 to the ATA drive 72. This causes the 
command buffer 180 to become empty, which in-turn i causes 

.u *«h .nmne rz>. tn reassert the RDY line 130. Fne 

command block may specify a transfer of zero sectors, one 
sector, or multiple sectors. 

After the drive 72 returns from the COMMAND WRITE 
operation (by asserting the IRQ line 179D), the command 
engine 262 either (i) initiates a new STATUS READ opera- 
tion (if no data transfer is required) to begin processing of 
the next controller command, or (ii) initiates a 256-cycle 
DATA TRANSFER operation to transfer one sector of data 
between the disk drive and one of the FIFOs 170, 172. When 
a DATA TRANSFER operation is completed, the command 
engine 262 either returns to the STATUS READ state, or, if 
additional sector transfers are needed, initiates one or more 
additional DATA TRANSFER operations. 

One benefit to using automated ATA controllers (as 
opposed to firmware) is that on read operations, the data can 
be retrieved from the drive as soon as it is available. In 
addition to reducing latency, this aspect of the design 
enables ATA drives with smaller buffers to be used without 
the usual loss in performance. 

Although this invention has been described in terms of 
certain preferred embodiments, other embodiments that are 
apparent to those or ordinary skill in the art are also within 
the scope of this invention. Accordingly, the scope of the 
present invention is intended to be defined only by reference 
to the appended claims. 

In the claims which follow, reference characters used to 
designate claim steps are provided for convenience of 
description only, and are not intended to imply any particular 
order for performing the steps. 
What is claimed is: 

1 A disk array controller which operatively connects a 
host computer to an array of disk drives, the host computer 
including a system memory, the disk array controller com- 
prising: 

a plurality of disk drive controllers, each disk drive 
controller connected to and configured to control at 
least one disk drive of the array; 
a microcontroller which dispatches controller commands 
to the disk drive controllers over a control bus to initiate 
transfers of input/output (I/O) data between the disk 
drives and the host computer, at least some of the 
controller commands including system memory 
addresses for performing said transfers, the microcon- 
troller responsive to I/O requests generated by the host 
computer; and 
an automated processor which transfers I/O data between 
at least the disk drive controllers and the system 
memory, the automated processor connected to the 
plurality of disk drive controllers by a packet bus which 
is separate from the control bus, and connected as a bus 
master to a bus of the host computer, the automated 
processor responsive to transfer commands and target 
addresses received from the disk drive controllers over 
the packet bus, at least some of the transfer commands 
specifying transfers of I/O data between the disk drive 
controllers and the system memory. 
2. The disk array controller according to claim 1, wherein 
the disk drive controllers generate packets that are trans- 
ferred to the automated processor over the packet bus, at 
least some of the packets including (i) a block of I/O data 
read from the disk drive array, and (ii) a transfer command 
and a target address which specify how the block of I/O data 
is to be routed by the automated processor. 
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^ediskarrayeontro.l er according ,o Cain, 1, ^SSlS* " 

each disk drive controller is configured to generate a packe I/C [<^l£™™ onet which opC ratively connects a 

.u.. i„^i„H»c « evsiem memorv write command, a target _ • _ ^ ^ , fnu „f rfuw drives, the host computer 

write "address, and a block of I/O data, and wherein the ^^^^^emory, the disk array controller corn- 
automated processor responds to the system memory write s 8 

command by writing the block of I/O data to the system " a » u of disk drive con , ro Uers, each disk dnve 

memory based on the target write address. controller connected to and configured to control at 

4. The disk array controller according to claim 1, wherein ^ ^ ^ ^ of ^ a[Tay . 

at least one of the disk drive controllers is configured to microcontro Uer which dispatches controUer commands 

generate a packet that includes a transfer ^W^™ 10 the disk drive con , ro llers over a first bus to initiate 

command, the transfer completion command indicating that ^ of iDput/ompul (I/O) data between the disk 

the disk drive controller has finished processing an i/u ^ ^ ^ ^ ^p^ at least some of the 

request, and wherein the automated processor responds to controUer commands including system memory 

the transfer completion command by *««™8 > *~ addresses for performing said transfers, the microcon- 

all disk drive controllers invoked by the I/O request have 15 responsive , 0 , /0 requests gene rated by the host 

finished processing the I/O request computer; and 

5. The disk array controller according to claun 1, further auu : mated processor which transfers I/O data between 
comprising a buffer coupled to the automat ed p~ a ~ f^uk drive controllers and the system 
wherein the automated processor stores I/O data in Uie buffer at commands and target 
and is responsive to transfer commands wh.ch specify 20 ™Sn£ P addresses received from , he disk drive 
transfers of I/O data to and from the butter. controllers, the automated processor connected to the 

t-Tte^My^to'T^SnZZtSl of disk drive controllers by a second bus 

each disk drive controller is configured to generate a packet pm £ flrs( bus 

that includes a system memory read command and where n which * sepa ^ ^ ^ ^ 

the automated processor responds to the system memory 25 W. »"? ^ rf * controUers generate packets that are 

read command by transferring I/O data from the system J^^™ d ™ m>|ed proce Lr over the second bus, 

memory to the buffer. „,!,„„,■„ at least some of the packets including (i) a block of I/O data 

7. The disk array controller according to claun 5, wherein at least sonw on p f which 

each disk drive controller is configured to generate , , pa ke, ead from the du* dr i M ^ ^ ^ ^ ^ ^ ^ 

that includes a buffer read command, and ^ wtomated 30 and a targct sysle m memory 

the automated processor con.rok aU accesses to by the disk 35 wherem m P ^ — bus . 

drive controllers to the packet bus. controller according to claim 19, 

9. The disk array controller according to claim 8, wherein m . » / d cessor grants the second bus to 
the automated processor grants the packet bus to individual "teem the utoma J & Sj a ^ robjn ^ 
disk drive controllers using a round robin arbitration proto- ^ tndw.dua. d,^ 

col. t , . R 21 The disk array controller according to claim 17, 

10. The disk array controller •^"W.'f^J J^L automated processor and the disk drive control- 
wherein the automated processor implements a bus arbttra wnere arbittation proloco l which guarantees 
tion protocol which guarantees that at least 1/N of the I/O ers ™P' e ™ bandwidth of the second bus is 
bandwidth of the packet bus is availab e to each disk drive J^™ 1 ^ disk 'drive coatro Uer, where N is the 
controller, where N is the number of disk drive controUers. 45 available towc^ conlroUers . 

11. The disk array controller according to claun 1 The dtk array controller according to claim 17, 
wherein each disk drive controller is an automated controUer 21 ^JZkdrlc controller is configured to generate 
which controls a single, respective disk dnve of the may f ^^fj letion coaaMA which indicates that .be 

12. The disk array controller according io bun M, — amp ^ processing a 
wherein each disk drive controller is an automated ATA 50 ^PecUve ^ 

controller which controls a single, respective ATA disk rotate™** comp i 6tion command by determm- 
drive. .. , . . „ ; n „ whether all disk drive controllers invoked by the pending 

13. The disk array controller according ,to dam 12 £ * flnisn6d processi ng the I/O request, 
wherein at least one of the automated ATA controllers I/O request hw i P according to claim 17 
includes a command buffer which buffers multiple disk drive 55 » ™ch^ ri ^ ^tat is an automated conuoller 
commands, so that a new disk dnve command am be JJ«^JJf ^ respectiv6 disk drive of the array, 
dispatched to the respective ATA disk dnve immediately which control a si* .^p^ ^ ^ ^ 
upon completion of a pending disk drive command . > arra u an ATA drive . 

14. The disk array controller accordmg to claim 1 wh °f ™ controller as in claim 17, wherein the 
wherein microcontroller and the automated processor are 60 ™ * OC6Sses mulliple P0 request at a 
integrated within a common semiconductor device , disk arny P^ (ransfcrs ^ , /0 ^ 

15. The disk array controUer accordmg to claim 1, ume, ^J n0 lDe rf controllers and the system memory 
wherein the microcontroller runs a control program wh.ch JJJ^^JJ of lhe 1/0 t6 q U es te to which the 
implements a RAID configuration. ,.•„,« i/n ri ata corresoonds 

?6. The disk array controller as ^1,^ « % d 'taT^£'syst«m which comprises a host corn- 
disk array controller processes mulUple 1 'O^Ko data puter and an array of disk drives, a method of processing an 
time, and the automated processor and transfers tne i/u aaia v 
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inpu.Zou.put (1/0) reques. gela.ed by .he host computer, ^X^^ 

the method compnsing:^ ^ anH lf> a disk mated processor: .._,__.„* „r 9 nliiralilv of 

(a) dispatching at leas, one "r^T... from a receiving packets over a uu* ..u..._ ~ r -; oll „: at 
drive controller lo initiate a transfer of 1/0 dala lrom a »k controllers invoked by the I/O request at 
dETdrive controlled by the disk drive controller to a s *sk on ^ ^ da(a read fr0 m 
system memory of the host computer, the controller D f 0 *n rive s and including transfer commands and 
command totaling a system ^memory address winch JjJ^m me mory addresses 'ha. ful y sp^y >o 
specifies a location to which the I/O data ts to be l ^ a J mileA processor transfers of the 10 data to a 
transferred; , ,. - system memory of .be hosl computer; and 

(b) dispatching controller commands to a plurality of 10 lra ^ fcrringthe ,/ 0 da.a to the system memory of .he bos. 
other disk drive controUers to in.Uate transfers of I/O ° acc0 rding to the transfer commands and the 
2T£f other disk drives of the array to the system computer acc ^ 

memorv oackets without regard to an identity of the /O request. 

(c) ^ponding to the controller command dispatched in ^U^^J^^^^^S 
s et (a! wi.h the disk drive controller by (.) reading the „ bus ^ receiving from one of the disk 
VO data from the disk drive, and (ii) generating and P** ^ of fM that ea ch fuUy 
sending TSquence of packets to - automated ^ opeBfal , Q be p6rforme d by the automated 

W £Sy wriS, the 1/0 data inc.uded therein to Ibining^^^ 

^^tarSS. *. wherein step (a) ^^^^SS^**** 
c^^EZ^"^*^^ on of Se I/O request without prior knowledge of the 

^Sa^^^ P TSemethodac C ording,c,aim3 5 ,where,nr T ei 2 

™ mm Sh^ 

HS^^S Ee=Slro«^ 

-g ^^5X^35? if wh^in the a ^ specifies at leas.^ 

sends the 1/0 request identifier and a completion command svst6m memory; 

TtSe automateclprocessor upon finishing processing of the ^ ^ ^ ^ froffl tn6 disk dnv6 according to the 

^"^^^^1^26, whereins,e P (a)is SS — * data M , bus ^ . ^ence of 

oerformed by a microcontroller which runs a contro transmUUj Q g some rf ^ packe(s , nclude 

rogr thec'ontrolprogramimplementingaRAlDstorage an(J .y-U^W 

method according to Cairn * wherein the = - ffjSl'S « , 

tmJS Processor is coupled to a buffer which tempo- 60 £ Q „ in claim 41> wherein the : disk dnve * 

stores I/O data, and wherein at least some of the ^ ^ 1/0 data rom , h e disk 

nacket of the sequence of packets include transfer com- comprises implementing an ATA protocol within auto- 

P manS which specify transfers of I/O data to and from the J™ » £ of ^ disk drive con ^U e r. 

^Inad.karraycon^^^ 

^W^A- ^drivewithinabufferofthediskdnvecontrone, 
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44 The method as in claim 41. wherein the conttolter 
command is received over a bus which is separate from the 

ta 45^3fJS cSim a 41. wherein transmitting the 
WO* —mining a Vf^^ 5 

Li i «n/n Hat a a transfer command, and a target sybicm 



completion of the I/O request by the disk drive conu-oUe^ 
wherein the completion value is adapted tc . be «mbined 
™,h romnletion values transmitted by other disk anye 
conUoUers to determine whether processing ot tne uu 
request is complete. 

47 The method according to claim 41, wherein transnut- 
.ing the I/O data on the bus comprises transmitting each 
packet during an assigned timeslot. 
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